Exploratory Data Analysis - Microsoft Stock Prices

In this case we are analyzing the Microsoft Stock Price.

We will try to find the best possible model in terms of accuracy, training with different data sizes.

Let's import our libraries to start the analysis:

In [ ]:
import pandas as pd
from fbprophet import Prophet
from fbprophet.plot import plot_plotly, plot_components_plotly
import plotly.express as px
import plotly
plotly.offline.init_notebook_mode()
import os, sys
path = os.getcwd()
path = os.path.dirname(path)
sys.path.append(path)
from train import train, save_model
import datetime as dt
from datetime import timedelta
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

The last years of Microsoft stock price was:

In [ ]:
# Loading Microsoft data
microsoft = pd.read_csv('../data/microsoft.csv')
microsoft
Out[ ]:
ds y
0 2001-11-26 32.570000
1 2001-11-27 31.870001
2 2001-11-28 31.400000
3 2001-11-29 32.419998
4 2001-11-30 32.105000
... ... ...
5032 2021-11-19 343.109985
5033 2021-11-22 339.829987
5034 2021-11-23 337.679993
5035 2021-11-24 337.910004
5036 2021-11-26 336.920013

5037 rows × 2 columns

In [ ]:
fig = px.line(microsoft, x='ds', y='y', labels={'ds': 'Date', 'y': 'Price'}, 
                title='Amazon Stock Price 2002-2021')
fig.update_layout(height=400, width=900, autosize=False, showlegend=True)
fig.show()

This pattern is not new for us, because it's pretty similar to Amazon and Apple Stocks prices. Flat price to 2015, and then start a big and fast raise.

We want to train the model with all the data, 10 years and 5 years to check the differences:

In [ ]:
# We want to predict 2021 year
microsoft['ds'] = pd.to_datetime(microsoft['ds'])
X_test = microsoft[microsoft['ds'].dt.year == 2021][['ds']]
X_test
Out[ ]:
ds
4809 2021-01-04
4810 2021-01-05
4811 2021-01-06
4812 2021-01-07
4813 2021-01-08
... ...
5032 2021-11-19
5033 2021-11-22
5034 2021-11-23
5035 2021-11-24
5036 2021-11-26

228 rows × 1 columns

1. Training the model with all the data (2001-2020)

In [ ]:
# Full data 2001-2020
X_train_full_data = microsoft[microsoft['ds'].dt.year != 2021]
X_train_full_data
Out[ ]:
ds y
0 2001-11-26 32.570000
1 2001-11-27 31.870001
2 2001-11-28 31.400000
3 2001-11-29 32.419998
4 2001-11-30 32.105000
... ... ...
4804 2020-12-24 222.750000
4805 2020-12-28 224.960007
4806 2020-12-29 224.149994
4807 2020-12-30 221.679993
4808 2020-12-31 222.419998

4809 rows × 2 columns

In [ ]:
# Predictions
model = Prophet()
model.fit(X_train_full_data)
forecast = model.predict(X_test)
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
In [ ]:
fig = plot_plotly(model, forecast, xlabel='Date', ylabel='Price')
fig.update_layout(title='MICROSOFT stock 2021 Predictions - Model trained with Full Data')
fig.show()
In [ ]:
# Validating predictions
val = forecast.merge(microsoft, on='ds', how='right')
val = val[['ds', 'yhat', 'y']]
val.columns = ['Date', 'Predicted Price', 'True Price']
val = val[val.Date.dt.year == 2021]
fig = px.scatter(val, x=val.Date, y=val.columns[1:],
                title='MICROSOFT stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.update_layout(height=400, width=900, autosize=False, showlegend=True)
fig.show()
In [ ]:
# Forecast Components
plot_components_plotly(model, forecast)
In [ ]:
# Scores
def scores(y_true, y_pred):
    print('MAE:', mean_absolute_error(y_true, y_pred))
    print('RMSE', np.sqrt(mean_squared_error(y_true, y_pred)))

y_true = microsoft[microsoft.ds.dt.year == 2021]['y']
y_pred =  forecast['yhat']
scores(y_true, y_pred)
MAE: 49.90572580249322
RMSE 55.18741852131982
In [ ]:
print('Mean Microsoft Price in 2021: $', round(val['True Price'].mean(), 2))
print(f'Score: ', 1 - 49.9 / val['True Price'].mean())
Mean Microsoft Price in 2021: $ 269.98
Score:  0.8151711336149684

Our first score was 0.81, which is good. This is because the price in 2021 keeps following the trend, and our model is able to follow the price in a reasonable way. Let's try with few years of data:

2. Training the model with 10 years of data (2010-2020)

In [ ]:
# Training data - 10 years
X_train_last_ten = microsoft[(microsoft.ds.dt.year >= 2010) & (microsoft.ds.dt.year <=2020)]
X_train_last_ten
Out[ ]:
ds y
2040 2010-01-04 30.950001
2041 2010-01-05 30.959999
2042 2010-01-06 30.770000
2043 2010-01-07 30.450001
2044 2010-01-08 30.660000
... ... ...
4804 2020-12-24 222.750000
4805 2020-12-28 224.960007
4806 2020-12-29 224.149994
4807 2020-12-30 221.679993
4808 2020-12-31 222.419998

2769 rows × 2 columns

In [ ]:
# Predictions
model = Prophet()
model.fit(X_train_last_ten)
forecast2 = model.predict(X_test)
fig = plot_plotly(model, forecast2, xlabel='Date', ylabel='Price')
fig.update_layout(title='MICROSOFT stock 2021 Predictions - Model trained with last 10 years of Data')
fig.show()
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
In [ ]:
# Validating predictions
val2 = forecast2.merge(microsoft, on='ds', how='right')
val2 = val2[['ds', 'yhat', 'y']]
val2 = val2[val2.ds.dt.year == 2021]
val2.columns = ['Date', 'Predicted Price', 'True Price']
fig = px.scatter(val2, x=val2.Date, y=val2.columns[1:], labels={'ds': 'Date'},
                title='MICROSOFT stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.update_layout(height=400, width=900, autosize=False, showlegend=True)
fig.show()
In [ ]:
plot_components_plotly(model, forecast2)
In [ ]:
y_true = microsoft[microsoft.ds.dt.year == 2021]['y']
y_pred =  forecast2['yhat']
scores(y_true, y_pred)
MAE: 26.500761891969713
RMSE 32.649428350973814
In [ ]:
print('Mean Microsoft Price in 2021: $', round(val['True Price'].mean(), 2))
print(f'Score: ', 1 - 26.5 / val['True Price'].mean())
Mean Microsoft Price in 2021: $ 269.98
Score:  0.9018443895951235

With half of the data we've got a pretty nice score of 0.90. The model keeps close of the price, again, because the price is following the upper trend, but our results was really good.

Maybe traning with few data can improve even more our results, let's check it:

3. Training with 5 years of data (2015-2020)

In [ ]:
# Training data - 5 years
X_train_last_five = microsoft[(microsoft.ds.dt.year >= 2015) & (microsoft.ds.dt.year <=2020)]
X_train_last_five
Out[ ]:
ds y
3298 2015-01-02 46.759998
3299 2015-01-05 46.330002
3300 2015-01-06 45.650002
3301 2015-01-07 46.230000
3302 2015-01-08 47.590000
... ... ...
4804 2020-12-24 222.750000
4805 2020-12-28 224.960007
4806 2020-12-29 224.149994
4807 2020-12-30 221.679993
4808 2020-12-31 222.419998

1511 rows × 2 columns

In [ ]:
# Predictions
model = Prophet()
model.fit(X_train_last_five)
forecast3 = model.predict(X_test)
fig = plot_plotly(model, forecast3, xlabel='Date', ylabel='Price')
fig.update_layout(title='MICROSOFT stock 2021 Predictions - Model trained with last 5 years of Data')
fig.show()
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
In [ ]:
# Validating predictions
val3 = forecast3.merge(microsoft, on='ds', how='right')
val3 = val3[['ds', 'yhat', 'y']]
val3 = val3[val3.ds.dt.year == 2021]
val3.columns = ['Date', 'Predicted Price', 'True Price']
fig = px.scatter(val3, x=val3.Date, y=val3.columns[1:],
                title='MICROSOFT stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.update_layout(height=400, width=900, autosize=False, showlegend=True)
fig.show()
In [ ]:
plot_components_plotly(model, forecast3)
In [ ]:
# Scores
y_true = microsoft[microsoft.ds.dt.year == 2021]['y']
y_pred =  forecast3['yhat']
scores(y_true, y_pred)
MAE: 14.615608409068535
RMSE 19.720572368510673
In [ ]:
print('Mean Microsoft Price in 2021: $', round(val['True Price'].mean(), 2))
print(f'Score: ', 1 - 14.61 / val['True Price'].mean())
Mean Microsoft Price in 2021: $ 269.98
Score:  0.9458847747918775

We can see here how beautiful our predictions are with 5 years of data, specially the first half of the year, with our predictions really close of the price. It's amazing how our predictions goes down in April with the price.

The score we have here is 0.94, with a 14.61 of MAE and less than 20 RMSE. Very good results.

Results

The best performance was with 5 years of traning data.

In [ ]:
results = pd.DataFrame(
    {'MAE': [49.9, 26.5, 14.61], 
    'RMSE': [55.18, 32.64, 19.72], 
    'Train Data': ['All the data', 'Last 10 years', 'Last 5 years']})
In [ ]:
fig = px.bar(results, x='Train Data', y=['MAE', 'RMSE'], barmode='group', 
        title='Train MAE: All Data vs Last 5 and 10 Years (Less is Better)')
fig.update_layout(height=400, width=900, autosize=False, showlegend=True)
fig.show()
In [ ]:
val['Last 5 Years'] = val2['Predicted Price']
val['Last 10 Years'] = val3['Predicted Price']
val = val.rename(columns={'Predicted Price': 'All the Data',  'y': 'True Price', 'ds': 'Date'})
fig = px.line(val, x='Date', y=val.columns[1:], title='Microsoft Stock Predictions: Train with all the Data vs Train with Last 5 and 10 years')
fig.update_layout(height=400, width=900, autosize=False, showlegend=True)
fig.show()

Training a model to make predictions in Microsoft Stocks

Let's train our final model to make future predictions in Microsoft Stocks.

In [ ]:
model = train('microsoft', '../data/microsoft.csv', False, True, len(X_train_last_five))
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
In [ ]:
# Making future predictions with the model: two years

# 1. Creating the forecast Dates
X_test_future = []
end = dt.datetime.strptime('2023-12-31', '%Y-%m-%d').date()
start = dt.datetime.strptime('2021-11-20', '%Y-%m-%d').date()

for i in range((end-start).days):
    X_test_future += [(start+timedelta(i)).strftime('%Y-%m-%d')]

X_test_future = pd.DataFrame(X_test_future)
X_test_future.columns = ['ds']
X_test_future
Out[ ]:
ds
0 2021-11-20
1 2021-11-21
2 2021-11-22
3 2021-11-23
4 2021-11-24
... ...
766 2023-12-26
767 2023-12-27
768 2023-12-28
769 2023-12-29
770 2023-12-30

771 rows × 1 columns

In [ ]:
# 2. Making predictions: 2 years
forecast = model.predict(X_test_future)
fig = plot_plotly(model, forecast, xlabel='Date', ylabel='Price')
fig.update_layout(title='Microsoft Stocks - Two Years Forecasting')
fig.show()

Final Step: Saving the model

Finally, we save our model:

In [ ]:
# Saving the model
save_model('../models', model, 'microsoft')
Model Succesfully Saved in: 
../models/microsoft.json

Conclussions

In this case, we were able to predict nicely the 2021 price of Microsoft. The price didn't change the trend, that's why we made predictions with 0.94 score.

This proves that we can make predictions over any stock, only if the price keeps the last years trend.

In the next notebook we will analyze the last stock, Tesla, which is raising their price very fast in the last months.